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Abstract:  The  Failure  Detection,  Isolation  and  Recovery  (FDIR)  in  the  International  Space 
Station  Alpha  (ISSA)  requires  timely  monitoring  and  diagnosis  of  failures  so  that  recovery 
actions  can  be  employed  to  safeguard  the  mission  and  the  life  of  crew.  Using  traditional 
methods  for  representation  of  domain  knowledge  and  for  diagnosis  proves  to  be  ineffectual 
because  of  the  scale,  complexity  and  dynamics  of  ISSA.  Model-based  approach  for  repre- 
senting systems  and  for  diagnosis  is  an  attractive  and  feasible  solution.  We  have  developed 
and  field  tested  a model-based  real-time  robust  monitoring  and  diagnostic  system  for  ISSA 
and  other  aerospace  systems.  The  system  is  represented  using  hierarchical  and  multiple- 
aspect  models,  which  include  representation  of  functional  structure  as  well  as  the  physical 
component  assemblies.  A discretized  model  of  the  failures  and  their  effects  is  represented 
using  timed  failure  propagation  graphs.  The  monitoring  mechanism  is  modeled  by  using  a 
discretized  sensor  space,  with  mechanisms  for  sensor  validation.  The  diagnostic  reasoning 
applies  structural  and  temporal  constraints  for  the  generation  and  validation  of  fault  hy- 
potheses using  the  “predictor-corrector”  principle.  The  diagnosis  is  generated  in  real-time 
amid  an  evolving  alarm  scenario,  and  uses  progressive  deepening  control  strategy.  The  ro- 
bust diagnostic  system  has  been  tested  and  demonstrated  using  ISSA  models  obtained  from 
the  Boeing  Company. 
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aspects;  program  synthesis;  sensor  failure 


INTRODUCTION:  An  increasingly  competitive  aerospace  market  requires  requires  com- 
puter integrated  Failure  Detection,  Isolation  and  Recovery  (FDIR)  systems  to  perform  com- 
plex and  sophisticated  analyses  that  are  capable  of  providing  real-time  embedded  vehicle 
health  management.  Simultaneously,  a general  trend  in  the  evolution  of  complex,  large- 
scale,  computer  integrated  systems  is  the  rapidly  increasing  use  of  design-time  models  in 
system  operation.  The  goal  is  to  synthesize  vehicle  health  management  systems  that  are 
formally  and  automatically  derived  from  the  integrated  model  sets  developed  during  the 
vehicle  design,  development,  build,  test,  and  verification. 

In  this  paper  we  describe  a model-integrated  approach  to  synthesis  of  FDIR  system  for 
aerospace  vehicles.  The  model-integrated  approach  is  based  upon  the  MultiGraph  Architec- 
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ture  (MGA)  [15].  The  primary  specifications  for  FDIR  come  from  the  International  Space 
Station  Alpha  (ISSA)  program  requirements,  though  FDIR  tools  and  formalisms  described 
here  could  be  applied  (with  appropriate  modifications)  to  task  of  health  management  for  most 
large-scale,  complex,  computer  integrated  system.  The  FDIR  system  consists  of  modeling 
formalisms  and  a health  management  system,  synthesized  from  the  models  of  the  artifact. 

FDIR  MODELING  PARADIGM:  It  is  evident  that  the  practical  use  of  a model- 
integrated  system  is  limited  by  the  “goodness”  of  the  models  themselves,  which  in  turn 
is  influenced  by  the  formalism  used  for  modeling.  Thus,  one  must  develop  modeling  for- 
malisms that  capture  the  essence  of  the  system  being  modeled  and  the  FDIR  requirements. 
One  must  also  recognize  the  fact  that  there  is  a critical  need  for  software  technology  which 
makes  high-performance  computing  and  communication  capabilities  accessible  for  end-users. 
Systems  engineers  need  domain-specific  modeling  and  analysis  environments  that  support  the 
building  and  verification  of  vehicle  fault  models,  provide  interfaces  to  engineering  databases 
and  systems  engineering  tools,  and  allow  the  synthesis  of  FDIR  systems  that  are  consistent 
with  the  vehicle  models.  Further,  theie  typically  are  some  mature  engineering  disciplines 
underlying  the  design  and  systems  engineering.  Thus,  the  modeling  paradigms  are  not  “nego- 
tiable”: systems  engineers  need  to  be  supported  by  rich,  domain  specific  concepts,  relations, 
and  composition  principles  routinely  used  in  the  field. 

The  main  challenges  in  using  a model-based  approach  for  the  FDIR  in  large-scale  heteroge- 
neous dynamic  systems  are  the  following  : 

1.  The  size  of  the  systems  in  terms  of  the  number  of  components,  the  complexity  of 
physical  processes  and  their  interactions  can  be  large.  In  providing  models  for  system- 
wide  diagnosis,  scaleability  of  the  modeling  technique  becomes  a major  issue. 

2.  Design  of  such  systems  involves  different  engineering  disciplines  with  different  focus 
and  tools.  In  the  FDIR  modeling  four  such  disciplines- are  identified  - signal,  fluid, 
electrical,  mechanical. 

3.  The  source  of  failures  may  be  outside  of  the  system  boundary.  Propagating  effects  of 
external  disturbances  must  be  traced. 

4.  The  primary  goal  of  diagnosis  in  critical  systems  is  the  prevention  of  the  occurrence  of 
critical  failures.  Prediction  of  the  propagation  of  discrepancies  requires  not  only  the 
spatial  but  also  the  temporal  isolation  of  fault  events.  For  this  purpose,  steady-state 
models  are  often  useless,  because  processes  may  only  slowly  converge  to  steady-state 
and  because  steady-state  models  do  not  capture  the  dynamics  of  fault  propagation. 

5.  Fault  diagnosis  is  based  on  the  observation  of  the  behavior  of  the  plant  during  a fault 
incidence.  Consequently,  the  models  to  be  used  in  fault  diagnosis  should  capture  the 
dynamic  behavior  of  processes  when  it  is  out  of  the  normal  operation  range.  Needless 
to  say,  modeling  uncertainties  in  these  regions  are  even  more  significant  than  in  the 
normal  operation  range. 


6.  The  FDIR  system  must  be  able  to  reason  about  abrupt  faults  and  input  disturbances, 
which  means  that  assumptions  about  the  system  (valid  only  during  normal  operation) 
become  invalid  and  unusable. 

7.  Faults  propagate  through  the  system.  That  is,  the  effects  of  a fault  rarely  tend  to  be 
localized  unless  specific  measures  are  taken.  The  goal  of  FDIR  is  to  contain  and  rectify 
the  faults  locally.  Thus  the  propagating  effects  of  faults  must  be  modeled. 


The  first  two  challenges  listed  above  address  the  issues  common  to  all  complex,  heteroge- 
neous, large-scale  system,  more  or  less  independently  from  the  application.  In  other  words, 
these  issues  are  not  specific  to  FDIR,  but  arise  in  control,  simulation  and  many  other  ap- 
plications, and  relate  to  the  formalisms  used  for  overall  organization  of  system  models.  The 
rest  of  the  challenges  listed  above  arise  out  the  FDIR  task  specifically.  These  issues  are 
addressed  in  the  formalisms  used  for  fault  models,  which  are  a subset  of  system  models.  We 
will  first  give  an  overview  of  the  organization  of  system  models,  followed  by  a description  of 
the  fault  models. 

Hierarchical,  Multiple  Aspect,  Discipline  Oriented  System  Models:  Because  our 
goal  is  to  model  engineering  systems,  the  modeling  technique  should  utilize  the  well  known 
engineering  techniques  to  manage  complexity. 

One  of  the  primary  model  structuring  method  is  focusing  on  selected  types  of  interactions; 
i.e.  to  model  a system  from  multiple  aspects.  Different  modeling  aspects  use  different  con- 
cepts (e.g.  the  physical  structure  is  defined  in  terms  of  assemblies  and  sub-assemblies,  while 
the  functional  structure  of  a temperature  control  system  is  defined  in  terms  of  material 
and  energy  flows).  Each  aspect  may  simultaneously  be  sub-divided  into  views,  that  con- 
tain discipline  oriented  information.  The  models  are  typically  organized  into  decomposition 
hierarchies  controlling  the  level  of  details  shown.  On  each  level,  the  system  is  modeled  as 
an  aggregate  of  connected  sub-systems.  The  type  of  the  connections  are  determined  by  the 
modeling  aspect  and  view.  The  subsystems  are  connected  through  an  interface,  which  defines 
their  boundaries  and  separates  the  internal  and  external  environments.  The  decomposition 
hierarchies  and  the  connected  set  of  subsystems  on  each  level  constitute  the  structural  model 
of  the  system.  Each  subsystem  can  also  be  characterized  in  terms  of  the  relationships  among 
its  input/output  quantitie  These  models  are  called  behavioral  models. 

For  purposes  of  FDIR,  the  system  models  are  broken  down  into  two  primary  hierarchies  - 
the  physical  assembly,  which  models  the  components  in  the  system,  and  the  functional  de- 
composition. The  physical  models  and  the  functional  models  are  both  described  in  terms  of 
their  structure  and  behavior.  Separation  of  the  functional  and  physical  structure  is  an  essen- 
tial difference  from  the  primarily  component-oriented  modeling  in  model-based  diagnostic 
systems  (e.g.  [1,  2,  3]).  Our  rationale  for  this  separation  is  the  following  : 


1.  There  are  many  examples  for  multi-function  components  that  are  involved  ir^the  im- 
plementation of  several  functionalities  in  the  same  time.  Well  known  examples  are 
computers  and  energy  distribution  systems. 
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Table  1:  Physical  Model  Aspects 


Aspect 

Concept  (s) 
Modeled 

Model  Elements 

Assembly  Aspect 

Component 
assemblies  and  en- 
ergy and  material 
flows. 

Sub-component  and  input  and 
output  flows. 

S' 

r 

State  Transitions 
Aspect 

Operation  states 
and  State  Transi- 
tion Machine 

Component  states,  sub-states, 
state  transitions,  local,  input 
and  output  events. 

Alarm  Genera- 
tion Aspect 

Sensors 

Alarms  that  the  sensor  gener- 
ates, and  sensor  attributes  like 
cost  and  time  to  use,  probabil- 
ity of  false  alarm,  etc. 

Component 
Faults  Aspect 

Faulty  behavior 

Failure  modes  and  failure  rates 
of  components. 

2.  Assignments  among  physical  components  and  functionalities  are  not  always  static. 
Physical  redundancy  and  the  use  of  multiple-function  components  are  frequently  used 
to  achieve  fault  tolerance  in  critical  systems. 

Both  the  physical  and  functional  models  have  many  different  aspects.  In  this  paper  we 
present  only  a very  brief  description  of  the  different  modeling  aspects  of  physical  and  func- 
tional models,  given  in  Table  1 and  Table  2.  For  a more  detailed  description,  the  reader  is 
referred  to  [14]. 

Fault  Modeling:  Model-based  diagnostic  systems  work  with  a model  (a  suitable  represen- 
tation) of  the  system.  The  level  of  detail  in  the  models  can  be  kept  at  the  level  required 
by  the  FDIR  requirements.  These  diagnostic  systems  interpret  the  obsei  ed  discrepancies 
in  the  ''-mtext  of  the  system  model.  There  are  primarily  two  kinds  of  mode1"  that  have 
traditionally  been  used  for  diagnosis  - functional  models  and  fault  models. 

Functional  models  (also  called  behavioral  models)  describe  the  “correct”  behavior  of  the 
system,  i.e. , how  the  system  is  supposed  to  behave  when  no  faults  are  present.  The  level 
of  abstraction  in  the  functional  models  can  vary  from  system  to  system,  depending  on  the 
application  - from  quantitative  models  (also  called  analytical  models)  using  state-space  rep- 
resentation to  qualitative  models. 

Quantitative  functional  models  (analytical  models)  use  Ordinary  Differential  Equations 
(ODE),  state-space  or  similar  representations  (examples  of  such  systems  can  be  found 
in  [11,  12,  13],  among  others),  to  diagnose  anomalies.  However,  the  usefulness  of  analyt- 
ical models  is  limited  to  small,  stable  sub-systems  only,  which  have  a well  defined  and  simple 
domain  theory,  as  opposed  to  the  large-scale,  complex  systems  addressed  in  this  paper. 
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Table  2:  Functional 

VIodel  Aspects 

Aspect 

Concept(s) 

Modeled 

Model  Elements 

Structure  Aspect 

Functional  struc- 
ture and  energy 
and  material  flows 

Sub-functionalities  and  input 
and  output  in  the  four  disci- 
plines ( signal,  fluid,  electrical 
and  mechanical). 

State  Transitions 
Aspect 

Operation  states 
and  State  Transi- 
tion Machine 

Functionality  states,  sub- 

states, state  transitions,  local, 
input  and  output  events. 

Failure  Propaga- 
tions Aspects 

Failure 

interactions 

Component 

failure  modes,  functional  dis- 
crepancies and  timed  failure 
propagations. 

F dure  Observa- 
tion Aspect 

Fault  monitoring 

Alarms  and  sensor  states. 

Implementation 

Aspect 

Relationship 

between 

functionalities  and 
components  in  the 
system 

The  physical  components  that 
fulfill  the  functionality  and 
the  redundancy  between  the 
components. 

To  address  the  complexity  in  most  engineering  systems,  some  researchers  have  used  qual- 
itative functional  models.  Qualitative  functional  models  divide  the  process  variable  space 
into  “ranges  of  interest”  and  use  qualitative  physics  to  generate  the  behavior  of  a system. 
They  have  met  with  varying  degrees  of  success  in  analyzing  and  predicting  the  complete  and 
correct  behavior.  The  functionality  can  be  described  using  just  input/output  relationships 
as  in  [4],  using  a mathematic.' 1 description,  or  using  a set  of  connected  components  and 
causal  sequences  which  give  a description  of  how  the  system  behaves  [1,  2,  3]. 

Using  functional  models  to  diagnose  faults  has  its  own  problems,  the  foremost  being  the 
accuracy  and  validity  of  models,  particularly  if  faults  are  present.  Further,  while  the  models 
might  be  good  for  identifying  the  presence  of  a malfunction  (using  simulation  or  analytical 
methods),  they  are  not  necessarily  helpful  in  diagnosing,  i.e.,  locating  the  faulty  compo- 
nent. This  is  because  using  functional  models  can  lead  to  an  explosion  in  the  size  of  the 
diagnostic  search  space  and  hence  the  number  of  possible  hypotheses,  thereby  rendering  di- 
agnosis intractable.  The  large  diagnostic  search  spaces  arise  out  of  the  attempt  to  reason 
about  abnormal  behavior-of  a system  using  models  that  describe  the  behavior  under  normal 
conditions.  Except  in  limited  cases,  such  attempts  have  not  been  successful. 

Fault  models  (fault  trees,  cause- consequence  diagrams,  diagnostic  dictionaries  etc.),  as  op- 
posed to  functional  models,  describe  system  behavior  when  faults  are  present.  These  models 
use  qualitative  representation  of  faults,  discrepancies  and  their  interactions.  This  is  done  by 
discretizing  the  failure  space  of  the  systems  in  terms  of  the  failure  modes  of  components, 


functional  discrepancies,  alarms  etc.  Such  fault  analysis  of  systems  is  standard  practice  in 
systems  engineering  (e.g.,  FMEA,  fault  trees,  etc.)  and  has  been  used  to  diagnose  faults. 
Since  our  goal  is  to  develop  a modeling  environment  which  is  based  on  the  concepts  and 
relations  used  by  systems  engineers,  a fault  model  representation  is  better  suited  for  our 
purpose. 

Fault  models  help  in  diagnosis  by  reducing  the  diagnostic  search  space.  Hypothesis  gen- 
eration is  straight-forward  - just  consider  all  the  failure  modes  that  could  have  caused  the 
discrepancies.  Diagnosing  with  a single  fault  assumption  is  simple.  Diagnosing  with  multiple 
faults  and/or  sensor  failure  assumption  can  possibly  result  in  a large  number  of  combina- 
tions of  faults  to  be  examined.  In  this  case,  some  reasonable  heuristics  can  be  used  which 
are  derived  from  the  structure  of  the  system. 

Fault  models  using  diagnostic  dictionaries  (the  kind  used  in  [7,  8]  etc.),  provide  a simple 
mapping  form  faults  to  effects.  The  effects  of  a fault,  once  all  the  propagations  have  taken 
place  and  the  system  has  reached  a steady  state,  are  listed.  Thus,  the  temporal  relationships 
between  faults  and  the  dynamics  are  'ost,  making  this  representation  less  attractive  for  FDIR 
task. 

The  temporal  relationships  and  the  dynamics  are  captured  in  the  fault  models  using  directed 
graphs  (as  in  [5,  6]).  In  the  research  described  in  this  paper,  we  use  a similar  representation. 
This  is  done  in  the  following  manner  (for  a more  detailed  description,  see  [14,  9])  : 

1.  Discretize  the  physical  and  functional  failure  space  to  model  only  the  plausible  fault 
states,  called  failure  modes  and  discrepancies,  respectively. 

2.  Discretize  the  observation  space  to  correspond  to  the  discretized  failure  space,  speci- 
fying the  discrete  alarms  and  sensor  states.  Describe  the  observation  of  failures  using 
alarms  and  sensor  states. 

3.  Specify  component  and  functional  boundaries  and  the  input  and  output  frilure  inter- 
fa  . e s . 

4.  Describe  the  interactions  between  failures  in  terms  of  timed  failure  propagations,  which 
capture  the  dynamics  of  system  behavior  when  it  is  out  of  the  normal  operation  range. 
The  uncertainty  in  dynamics  is  expressed  by  using  a propagation  interval.  The  failure 
propagations  can  describe  the  interactions  of  failures  within  a sub-system  or  between 
sub-systems. 

The  above  method  of  modeling  faults  and  their  interactions  address  the  challenges  of  FDIR 
task  outlined  earlier.  The  use  of  these  models  for  real-time  robust  diagnostics  during  system 
operation  is  briefly  described  in  the  next  section. 

REAL-TIME  ROBUST  DIAGNOSTICS:  An  embedded  robust  diagnostic  system  was 
developed,  which  is  synthesized  from  the  hierarchical  fault  models.  The  goal  here  was  to 
develop  diagnostic  software  which  doesn’t  have  a pre-defined  structure,  but  instead,  the 
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structure  of  the  diagnostic  system  is  derived  from  the  structure  of  the  system,  as  captured  in 
the  models.  Thus,  the  overall  diagnostic  system  consists  of  many  monitoring  and  diagnostic 
sub-systems,  as  shown  in  Figure  1. 


Figure  1:  Block  Diagram  of  Robust  Diagnostic  System 


The  diagnostic  system  structure  is  determined  by  the  functional  hiererachy  in  the  models. 
For  each  functionality,  a monitoring  sub-system,  the  Functionality  Monitor  (FM),  and  a 
diagnostic  sub-system,  the  Functionality  Diagnoser  (FD),  is  generated.  The  interfaces  of 
these  sub-systems  are  determined  by  the  interfaces  in  the  models.  An  FM  receives  the 
sensor  signals  pertinent  to  the  functionality  it  represents  (as  specified  in  the  model  of  the 
functionality).  If  and  alarm  condition  exists,  or  if  the  sensor  signatures  change,  the  FM 
generates  “diagnostic  events”  and  sends  them  to  the  In+er-Level  Coordinator  (ILC).  The 
FDs  are  also  generated  according  to  the  functional  breakdown,  and  there  is  one  FD  for 
each  functionality.  The  interfaces  of  the  FDs  (incoming  and  outgoing  failures  and  diagnostic 
events)  are  determined  by  the  respective  functionality  models. 

The  ILC  (1)  receives  the  diagnostic  events  from  the  FMs,  (2)  routes  the  events  to  the  proper 
FDs,  (3)  controls  and  guides  diagnostic  search  (4)  receives  the  local  diagnostic  results  from 
FDs  (5)  combines  the  local  diagnostic  results  and  generates  the  fault  hypotheses. 

The  prominent  features  of  the  diagnostic  system  are  : 


• Diagnosis  of  multiple  faults  (no  assumption  of  single  or  multiple  points  of  failures). 

• Identification  of  observation  errors. 
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• Robustness  against  a large  number  of  sensor  failures  and  graceful  degradation  as  the 
number  of  sensor  failures  increase. 

• It  is  event-driven  and  uses  incremental  non-monotonic  reasoning. 

• It  predicts  future  events  and  uses  the  predictor-corrector  principle  to  revise  its  hy- 
potheses. 

• Restricts  the  diagnostic  search  to  the  relevant  parts  of  the  functional  hierarchy. 

• Identifies  loss  of  model  validity  in  case  of  large  faults  and  restricts  its  search  to  those 
parts  of  the  hierarchy  where  the  model  of  the  system  seems  to  be  valid. 

• Uses  algorithms  of  polynomial  complexity. 

For  details  of  the  robust  diagnostic  system  and  the  algorithms  used,  please  see  [9,  10]. 

CONCLUSION:  A model-integrated  approach  to  FDIR  of  complex,  large-scale  systems 
was  presented.  Although  the  primary  motivation  for  this  research  came  from  the  FDIR 
requirements  for  ISSA,  the  approach  used  here  could  be  used  for  a variety  of  engineering 
systems,  since  it  provides  a solution  approach  for  FDIR  modeling  and  embedded  health 
management  for  any  complex,  large-scale  engineering  system.  The  modeling  formalisms  are 
derived  from  standard  engineering  practices  and  domain  specific  concepts  and  relations,  thus 
making  it  more  accessible  to  systems  engineer.  The  structural  and  functional  organization 
of  models  makes  the  complexity  and  the  scale  of  the  systems  easier  to  tackle.  The  feasibility 
of  the  model-integrated  approach  for  using  design  information  to  formally  and  automati- 
cally derive  embedded  health  management  systems  is  demonstrated  by  the  real-time  robust 
diagnostic  system  synthesized  from  the  models. 
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